This exploratory analysis of America’s favorite past-time takes an in-depth look at the history of Negro League play and the statistics they generated.
The data we are using is from Baseball Reference, though it was compiled by Seamheads from newspapers and other primary sources of the time. It is important to note that the data set is not complete by any means, as box scores and record keeping were not as comprehensive in the Negro Leagues as they were for the American and National Leagues. In addition, the Negro Leagues played many exhibition, barnstorming, and other types of games that are not included in the data set.
Our data includes all Negro League pitchers and position players from 1920 to 1948 (the period they have been designated major leagues by MLB) in the Baseball Reference database, with the aforementioned caveats. Our data set includes all hitters with at least 100 plate appearances (PA), and all pitchers who appeared in at least 10 games. The data set only includes the career stats each player put up while in the Negro Leagues (for instance, Jackie Robinson’s statistics with the Dodgers are not included in the set).
The median number of plate appearances in our dataset was 374, with the mean being 672.2. The median and mean on-base percentage plus slugging percentage (OPS) was .655 (more on this statistic later). For pitchers, the median earned run average (ERA) was 4.50, with the mean ERA being 4.74. The median fielding independent pitching (FIP) was 2.80, with the mean FIP being 2.77.
## Rk Player OPS. PA
## Min. : 1.0 Length:864 Min. :-33.00 Min. : 100.0
## 1st Qu.:216.8 Class :character 1st Qu.: 58.00 1st Qu.: 196.0
## Median :432.5 Mode :character Median : 81.00 Median : 374.0
## Mean :432.5 Mean : 80.62 Mean : 672.2
## 3rd Qu.:648.2 3rd Qu.:105.25 3rd Qu.: 803.5
## Max. :864.0 Max. :214.00 Max. :5405.0
##
## From To Age G
## Min. :1920 Min. :1920 Length:864 Min. : 23.00
## 1st Qu.:1923 1st Qu.:1929 Class :character 1st Qu.: 60.75
## Median :1928 Median :1937 Mode :character Median : 113.00
## Mean :1930 Mean :1937 Mean : 179.35
## 3rd Qu.:1937 3rd Qu.:1946 3rd Qu.: 228.50
## Max. :1948 Max. :1948 Max. :1199.00
##
## PA.1 AB R H
## Min. : 100.0 Min. : 81.0 Min. : 1.00 Min. : 10.0
## 1st Qu.: 196.0 1st Qu.: 176.0 1st Qu.: 22.00 1st Qu.: 41.0
## Median : 374.0 Median : 337.5 Median : 45.00 Median : 86.5
## Mean : 672.2 Mean : 596.7 Mean : 94.71 Mean : 165.2
## 3rd Qu.: 803.5 3rd Qu.: 719.0 3rd Qu.: 111.00 3rd Qu.: 198.0
## Max. :5405.0 Max. :4756.0 Max. :1149.00 Max. :1546.0
##
## X1B X2B X3B HR
## Min. : 7.0 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 33.0 1st Qu.: 5.00 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 68.0 Median : 12.00 Median : 4.000 Median : 2.000
## Mean : 124.6 Mean : 24.74 Mean : 8.594 Mean : 7.292
## 3rd Qu.: 150.2 3rd Qu.: 28.00 3rd Qu.: 10.000 3rd Qu.: 6.000
## Max. :1155.0 Max. :262.00 Max. :112.000 Max. :186.000
##
## RBI SB CS BB
## Min. : 1.00 Min. : 0.00 Min. :0.0000 Min. : 0.00
## 1st Qu.: 18.00 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 12.00
## Median : 37.00 Median : 5.00 Median :0.0000 Median : 26.00
## Mean : 83.51 Mean : 14.43 Mean :0.6286 Mean : 53.76
## 3rd Qu.: 91.00 3rd Qu.: 15.00 3rd Qu.:0.0000 3rd Qu.: 63.00
## Max. :1001.00 Max. :285.00 Max. :9.0000 Max. :527.00
## NA's :794
## SO BA OBP SLG
## Min. : 0.000 Min. :0.0970 Min. :0.1290 Min. :0.0980
## 1st Qu.: 4.500 1st Qu.:0.2230 1st Qu.:0.2810 1st Qu.:0.2840
## Median :10.000 Median :0.2580 Median :0.3165 Median :0.3390
## Mean : 9.667 Mean :0.2546 Mean :0.3133 Mean :0.3420
## 3rd Qu.:14.000 3rd Qu.:0.2900 3rd Qu.:0.3510 3rd Qu.:0.3962
## Max. :19.000 Max. :0.4080 Max. :0.4920 Max. :0.7200
## NA's :846
## OPS OPS..1 TB GIDP
## Min. :0.2390 Min. :-33.00 Min. : 10.00 Min. :0.0000
## 1st Qu.:0.5707 1st Qu.: 58.00 1st Qu.: 52.75 1st Qu.:0.0000
## Median :0.6550 Median : 81.00 Median : 112.00 Median :0.0000
## Mean :0.6553 Mean : 80.62 Mean : 229.03 Mean :0.0899
## 3rd Qu.:0.7370 3rd Qu.:105.25 3rd Qu.: 268.75 3rd Qu.:0.0000
## Max. :1.1780 Max. :214.00 Max. :2329.00 Max. :1.0000
## NA's :775
## HBP SH SF IBB
## Min. : 0.000 Min. : 0.00 Min. :0 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 4.00 1st Qu.:0 1st Qu.: 0.0000
## Median : 1.000 Median : 10.00 Median :0 Median : 0.0000
## Mean : 3.527 Mean : 18.23 Mean :0 Mean : 0.1533
## 3rd Qu.: 4.000 3rd Qu.: 22.00 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :39.000 Max. :160.00 Max. :0 Max. :11.0000
## NA's :17 NA's :669 NA's :603
## Pos Team Player.additional
## Length:864 Length:864 Length:864
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Rk Player ERA. G
## Min. : 1.0 Length:435 Min. : 42.00 Min. : 10.00
## 1st Qu.:109.5 Class :character 1st Qu.: 80.00 1st Qu.: 16.00
## Median :218.0 Mode :character Median : 94.00 Median : 35.00
## Mean :218.0 Mean : 98.31 Mean : 53.81
## 3rd Qu.:326.5 3rd Qu.:113.00 3rd Qu.: 74.50
## Max. :435.0 Max. :236.00 Max. :303.00
##
## From To Age W
## Min. :1920 Min. :1920 Length:435 Min. : 0.00
## 1st Qu.:1923 1st Qu.:1929 Class :character 1st Qu.: 4.00
## Median :1929 Median :1938 Mode :character Median : 10.00
## Mean :1931 Mean :1937 Mean : 17.84
## 3rd Qu.:1938 3rd Qu.:1946 3rd Qu.: 24.00
## Max. :1948 Max. :1948 Max. :121.00
##
## L W.L. Dec ERA
## Min. : 0.00 Min. :0.0000 Min. : 1.00 Min. : 1.700
## 1st Qu.: 5.00 1st Qu.:0.3640 1st Qu.: 9.00 1st Qu.: 3.805
## Median :11.00 Median :0.4680 Median : 21.00 Median : 4.500
## Mean :17.08 Mean :0.4603 Mean : 34.92 Mean : 4.738
## 3rd Qu.:24.00 3rd Qu.:0.5710 3rd Qu.: 50.50 3rd Qu.: 5.485
## Max. :89.00 Max. :1.0000 Max. :182.00 Max. :10.510
##
## G.1 GS CG SHO
## Min. : 10.00 Min. : 1.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 16.00 1st Qu.: 9.00 1st Qu.: 4.00 1st Qu.: 0.000
## Median : 35.00 Median : 20.00 Median : 10.00 Median : 1.000
## Mean : 53.81 Mean : 35.37 Mean : 20.86 Mean : 1.903
## 3rd Qu.: 74.50 3rd Qu.: 51.00 3rd Qu.: 31.00 3rd Qu.: 3.000
## Max. :303.00 Max. :183.00 Max. :139.00 Max. :31.000
##
## SV IP H R
## Min. : 0.000 Min. : 24.20 Min. : 31 Min. : 10.0
## 1st Qu.: 0.000 1st Qu.: 79.05 1st Qu.: 92 1st Qu.: 57.5
## Median : 1.000 Median : 175.20 Median : 194 Median :116.0
## Mean : 1.685 Mean : 303.47 Mean : 320 Mean :180.0
## 3rd Qu.: 2.000 3rd Qu.: 422.55 3rd Qu.: 440 3rd Qu.:258.5
## Max. :24.000 Max. :1603.10 Max. :1700 Max. :871.0
##
## ER HR BB IBB
## Min. : 8.0 Min. : 0.000 Min. : 7.0 Min. :0.0000
## 1st Qu.: 45.5 1st Qu.: 2.000 1st Qu.: 32.0 1st Qu.:0.0000
## Median : 93.0 Median : 5.000 Median : 66.0 Median :0.0000
## Mean :141.9 Mean : 9.424 Mean :101.7 Mean :0.1759
## 3rd Qu.:206.5 3rd Qu.:12.000 3rd Qu.:142.5 3rd Qu.:0.0000
## Max. :695.0 Max. :63.000 Max. :477.0 Max. :4.0000
## NA's :20 NA's :236
## SO HBP BK WP
## Min. : 6.0 Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 34.5 1st Qu.: 2.000 1st Qu.:0.0000 1st Qu.: 1.000
## Median : 81.0 Median : 5.000 Median :0.0000 Median : 2.000
## Mean : 146.3 Mean : 8.405 Mean :0.1522 Mean : 3.333
## 3rd Qu.: 199.5 3rd Qu.:12.000 3rd Qu.:0.0000 3rd Qu.: 5.000
## Max. :1192.0 Max. :47.000 Max. :2.0000 Max. :31.000
## NA's :146
## BF ERA..1 FIP WHIP
## Min. : 5.0 Min. : 42.00 Min. :0.020 Min. :0.951
## 1st Qu.: 304.2 1st Qu.: 80.00 1st Qu.:2.035 1st Qu.:1.329
## Median : 606.0 Median : 94.00 Median :2.800 Median :1.449
## Mean :1034.9 Mean : 98.31 Mean :2.770 Mean :1.481
## 3rd Qu.:1521.8 3rd Qu.:113.00 3rd Qu.:3.575 3rd Qu.:1.601
## Max. :5541.0 Max. :236.00 Max. :6.140 Max. :2.317
## NA's :17 NA's :20
## H9 HR9 BB9 SO9
## Min. : 6.400 Min. :0.0000 Min. : 1.100 Min. : 1.30
## 1st Qu.: 9.000 1st Qu.:0.1000 1st Qu.: 2.600 1st Qu.: 3.40
## Median : 9.800 Median :0.3000 Median : 3.200 Median : 4.20
## Mean : 9.929 Mean :0.2959 Mean : 3.398 Mean : 4.22
## 3rd Qu.:10.800 3rd Qu.:0.4000 3rd Qu.: 3.950 3rd Qu.: 4.85
## Max. :17.800 Max. :1.4000 Max. :10.100 Max. :10.10
## NA's :20
## SO.BB Pos Team Player.additional
## Min. :0.310 Length:435 Length:435 Length:435
## 1st Qu.:0.960 Class :character Class :character Class :character
## Median :1.260 Mode :character Mode :character Mode :character
## Mean :1.354
## 3rd Qu.:1.610
## Max. :4.370
##
One of the greatest baseball managers of all time, Earl Weaver of the Baltimore Orioles, famously said “your most precious possessions on offense are your 27 outs.” On-base percentage (OBP) is a useful statistic because it measures a hitter’s ability to avoid an out. It is superior to earlier statistics like batting average because it factors in a batter’s ability to draw walks.
\(OBP = \frac{H+BB+HBP}{AB+BB+HBP+SF}\)
OBP was made famous by Michael Lewis’s 2003 book Moneyball (later adapted into a movie), which chronicled the 2002 Oakland Athletics and their data-driven quest to build a competitive roster with a much lower payroll than other MLB teams. In a memorable scene from the movie, Brad Pitt, playing general manager Billy Beane, describes why he pursues certain players to the older, less data-driven scouts who don’t understand. Over their objections to superficial factors like weight or off-field issues, Beane simply states, “He gets on-base.”
Here is an interactive visual of OBP on the X-axis and slugging percentage (SLG) (essentially a measure of power) on the Y-axis for Negro Leaguers from 1920 to 1948 with at least 100 plate appearances.
## New names:
## Rows: 864 Columns: 35
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (5): Player, Age, Pos, Team, Player-additional dbl (30): Rk, OPS+...3, PA...4,
## From, To, G, PA...9, AB, R, H, 1B, 2B, 3B, H...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `OPS+` -> `OPS+...3`
## • `PA` -> `PA...4`
## • `PA` -> `PA...9`
## • `OPS+` -> `OPS+...26`